Skip to content

feat(orchestrator): add backup-plugin-db skill for WAL-safe nightly snapshots#4

Open
evannadeau wants to merge 3 commits into
SpawnBox-dev:mainfrom
evannadeau:feat/orchestrator-backup-plugin-db
Open

feat(orchestrator): add backup-plugin-db skill for WAL-safe nightly snapshots#4
evannadeau wants to merge 3 commits into
SpawnBox-dev:mainfrom
evannadeau:feat/orchestrator-backup-plugin-db

Conversation

@evannadeau
Copy link
Copy Markdown

Summary

Adds a new backup-plugin-db skill under the orchestrator plugin that installs nightly point-in-time snapshots of the plugin's SQLite DB(s) using SQLite's online backup API (WAL-safe under a running MCP server), written atomically to a user-chosen destination.

Motivated by the realization that the plugin runs against two databases — ~/.claude/orchestrator/global.db and <project>/.orchestrator/project.db — and that the bulk of stored knowledge typically lives in the project DBs. Most users discover this only after losing data.

What's in the skill

  • scripts/snapshot-plugin-db.py — single-source snapshot script.
    • SQLite online backup API (concurrent-write safe).
    • Atomic tempfile-then-rename writes (cloud-sync clients never see a torn file).
    • Per-source-stem snapshot filenames (global-YYYY-MM-DD.db / project-YYYY-MM-DD.db) so the same destination can host multiple DBs without collision.
    • Optional --retain-days N retention, scoped to the source-stem pattern so it cannot accidentally delete adjacent files.
    • Refuses to write the snapshot inside the source DB's directory.
    • User-chosen destination only — --cloud-root or $CLAUDE_ORCHESTRATOR_BACKUP_ROOT, no implicit defaults, hard-fails with a clear message when neither is set.
  • scripts/install-snapshot-timer.sh — Linux/macOS/WSL helper. Prefers systemd-user; falls back to cron when the user bus is unavailable. Idempotent on --name. Per-install retention. WSL2 linger advisory.
  • scripts/install-snapshot-task.ps1 — Windows helper. Registers a Scheduled Task, prefers pyw.exe to avoid console flashes, validates -Time/-RetainDays, idempotent on -TaskName.
  • SKILL.md — walkthrough that installs one timer per DB the user cares about, with a Prerequisites section covering the Python 3.10+ requirement and the Windows MS-Store-stub trap.

Design choices worth flagging

  • Two-DB coverage by composition, not a flag. The user runs the helper once per DB they care about, each with a distinct --name/-TaskName and optionally a distinct --retain-days. Keeps the script single-purpose and lets users mix retention policies between global and project DBs.
  • No default destination, no probing for OneDrive/Dropbox/Drive/iCloud. Probing produces surprising selections on machines with multiple sync clients, and silent-success modes when the destination isn't actually syncing. Requiring an explicit destination is more annoying but more honest.
  • Snapshots are safe to cloud-sync; the live DB is not. SKILL.md spells this out — SQLite file locking does not cross cloud-sync boundaries, and the WAL sidecar desyncs from the main file. Snapshot, don't mirror.

Testing

End-to-end validated on two fresh hosts using self-contained agent prompts that synthesize a source DB and exercise install/run/retention/coexistence/idempotency/teardown:

  • WSL2 (Ubuntu 24.04, systemd-user): 8/8 PASS.
  • Windows (PowerShell 7, Python 3.14 from Microsoft Store): 8/8 PASS.

The snapshot script itself was also smoke-tested against real plugin DBs on the authoring host (28-note global, 150-note project, both verified round-trippable via sqlite3 PRAGMA integrity_check).

Out of scope (deliberately)

  • Restore tooling: SKILL.md documents the read-back drill (PRAGMA integrity_check on the snapshot) but doesn't ship a one-shot restore script. The destructive half of the drill is a user action.

Test plan

  • Snapshot script: WAL-safe smoke test against live plugin DB with concurrent MCP writes.
  • --retain-days prunes only files matching the source-stem pattern (verified with adjacent non-matching files).
  • Two installs against the same destination coexist (distinct --name, distinct retention).
  • Idempotent re-install updates the schedule rather than duplicating it.
  • WSL2 fresh host: install via helper, force-run, verify snapshot lands and is non-zero, teardown clean.
  • Windows fresh host: install via helper, force-run, verify snapshot lands and is non-zero, teardown clean.

🤖 Generated with Claude Code

Evan Nadeau and others added 3 commits May 12, 2026 08:09
…napshots

Ships a skill that installs daily point-in-time snapshots of the
orchestrator plugin's SQLite databases. Mirrors the install-launchers
skill pattern (SKILL.md + scripts/) so the canonical content lives
inside the plugin and gets installed into the user's environment on
demand.

What's included
- SKILL.md walking through setup. The destination is entirely the
  user's choice (no defaults inferred); the skill suggests cloud-sync
  folders or backed-up local paths as examples.
- scripts/snapshot-plugin-db.py — WAL-safe online backup via SQLite's
  Connection.backup() API, atomic temp+rename writes, optional
  per-source retention (--retain-days), and a refuse-to-write-inside-
  source guard.
- scripts/install-snapshot-timer.sh — bash helper that installs a
  systemd-user timer when available, or a tagged crontab entry as a
  fallback. Idempotent on --name. Documents WSL2 linger requirement.
- scripts/install-snapshot-task.ps1 — PowerShell helper that registers
  a Windows Scheduled Task. Idempotent on -TaskName. Prefers pyw.exe
  for windowless runs.

Two-database coverage
The plugin uses two SQLite files — global.db (cross-project) and
project.db (per-project). The SKILL.md and helpers are explicit about
this and walk users through installing one timer per DB they care
about (with distinct --name / -TaskName values and optional distinct
--retain-days policies). The most common omission this addresses is
backing up only the global DB and silently losing the project DB,
which typically holds the bulk of a user's knowledge.

Python is already a baseline dep for the plugin via sidecar/
(embed_server.py + requirements.txt), so this adds no new runtime
requirement.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…LL.md

Calls out Python 3.10+, the Windows Microsoft Store App-Execution-Alias
stub trap, a writable destination, and the systemd-user/cron requirement
on Linux/macOS. Surfaces the most common upstream-user wall (Store stub
on a fresh Windows box) before the install step rather than after a
confusing helper failure.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…SKILL.md

Documents where snapshot failures show up by default on each platform
(journalctl / cron mail / Task Scheduler LastTaskResult) and the
canonical pattern for attaching alerting — systemd drop-in override on
Linux/macOS/WSL, RestartCount + event-triggered follow-up task on
Windows — without re-registering or modifying the shipped units.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant